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Abstract. - Understanding the structure and evolution of online bipartite networks is a significant 
task since they play a crucial role in various e-commerce services nowadays. Recently, various 
attempts have been tried to propose different models, resulting in either power-law or exponential 
degree distributions. However, many empirical results show that the user degree distribution 
actually follow a stretched exponential decay, which cannot be fully describe by previous models. 
In this Letter, we propose an evolving model, considering two different user behaviors: random and 
preferential attachment. Extensive empirical results on two real bipartite networks, Delicious and 
CiteULike, show that the theoretical model can well characterize the structural of real networks for 
both user and object degree distributions. In addition, we introduce a structural parameter, p, to 
demonstrate that the hybrid user behavior leads to the stretched exponential degree distribution, 
and the region of power-law tail will increase with the increment of p. The proposed model might 
shed some lights in understanding the underlying laws governing the structure of real online 
bipartite networks. 
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Introduction. — The past decade has witnessed a 
great explosion of studying and understanding the un- 
derlying mechanisms of various real-life networks, ranging 
from the Internet, scientific collaboration networks, pro- 
tein networks to social networks, etc [IHZ]- Although they 
respectively have their own properties and characteristics, 
empirical analyses show that many common characteris- 
tics and phenomena can be discovered from networks with 
such a wide-range functions, e.g. a small average distance 
between nodes, a large clustering coefficient [5], power- 
law degree distribution [3] and community structures |10j 
of the emerging structure. Recently, studies on the math- 
ematics of networks have been driven largely by those ob- 
served empirical properties of real networks, as well as net- 
work dynamics. However, many pioneering works in this 
area focus on designing evolutionary models of unipartite 
neworks which only have one kind of nodes, such as Erdos- 
Renyi network [TT], Watt-Strogatz network [S], Barabasi- 
Albcrt network [5J , as well as many extensive variants con- 
sidering different factors (e.g. aging effect [I2||T3] and so- 
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cial impact [MHTS] ). Recently, with the advent of Web 
2.0 and affiliated applications, the family of Networks also 
has received many new members. One example is the 
bipartite network which involves two different kinds of 
nodes with different functions [TThT5] . Different from tra- 
ditional networks, the nodes in a pure bipartite network 
can be divided into two independent communities, where 
edges arc only allowed to exist between different commu- 
nities. Nowadays, this bipartite network is widely applied 
in both online platforms (e.g. online services where users 
view/purchase products [2"0h22| . or listen to music [2"5] ). 
biology [24T427] and medical science [28h30] and theoreti- 
cal studies . There is also a vast class of researches 
that have recently reported many universal properties in 
unipartite networks, such as power-law degree distribution 
and correlation [T7J[T!5] and community structure p£fti58] . 
could also be found in bipartite networks. Consequently, it 
has attracted an increasing attention from scientific com- 
munity due to its wide application and bright prospect 
in characterizing the essential properties of real networks. 
The first and natural attempt is to project the bipartite 
network to a corresponding unipartite network and using 
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methods for traditional networks [3HH412] . However, it is 
argued that such one-mode projection ignores much infor- 
mative structure and relationship, subsequently, it would 
give unreliable or incorrect results 36,43,44 . Therefore, 
a more common approach is to keep the original bipar- 
tite structure, investigate both its specific and common 
properties, and try to uncover the underlying mechanism 
driving the emergence of this two-mode network. Newman 
et al. used the random graph model to describe social net- 
works of both unipartitc and bipartite relations [43J . Using 
generating functions [35] , they concluded that the cluster- 
ing and average degree of real affiliation networks, as one 
typical kind of bipartite networks, agreed well with the 
theoretical prediction. Lambiotte et al. proposed a per- 
sonal identification and community imitation (PICI) based 
model to consider both effects of collective behavior and 
personalization |23j . This model generated an exponen- 
tial and power-law degree distribution for music groups 
and owners, respectively. Sood and Rcdner introduced 
the voter model on networks of power-law degree distri- 
butions with and without degree correlation, both of which 
showed the consensus time was greatly dependent on the 
value of exponent [35] . Noh et al. demonstrated that dif- 
ferent mechanisms would generate different shape of de- 
gree distributions in group selection systems [47]. That 
is to say, a random selection process would result in an 
exponential distribution of the activity degree, otherwise 
a power-law distribution of group size and activity degree 
would arise from the resultant force of preferential selec- 
tion and fixed-probability creation. Sncppcn et al. pro- 
posed a minimalistic model of directed bipartite network, 
and a self-organization phenomenon was observed by a dy- 
namical reconnection process [35] • Similar result was also 
found in collaboration bipartite networks via preferential 
attachment of actors' degree [H]. Hence, this model only 
reproduced that one kind of node followed power-law but 
neglecting outputs of the other side of nodes. Saavedra 
et al. introduced two mechanisms, specialization and in- 
teraction, would produce exponential degree distribution 
for both sides [H5] . In addition, they found this bipartite 
cooperation can well characterize the structure of both 
ecological and organization networks. 

In this Letter, we focus on studying the degree 
distribution of online bipartite networks where users 
view/choose/select objects (e.g. bookmarks, music, 
movies), as well as the underlying mechanisms. Despite 
many previous studies demonstrated that both exponen- 
tial and power-law degree distribution could be obtained 
by corresponding models, empirical analysis of online bi- 
partite networks shows that the user degree distribution 
follows stretched cxponentiial instead of pure exponential 
decay, while the object degree distribution always obeys 
power-law [19] [50], and it can not be fully explained by 
previous models. Therefore, We propose an evolutionary 
model to consider the proactive selection activity of users 
and the passive pattern of objects. Theoretical analysis 
shows that the present model can not only well reproduce 



the two different degree distribution, but also find good 
agreements of two real-world data sets, Deliciou j]] and Ci- 
teULik<B- In addition, we find that the structural param- 
eter, p, determines the transformation from exponential 
to power-law decay of the user degree distributiuon. 

Model. — In this section, we shall propose an evolving 
model to uncover the growing dynamics of online bipar- 
tite networks. Here, we mainly consider two mechanisms: 
random and preferential attachment. In particular, we as- 
sume there are two kind of online behaviors for users: she 
can either randomly choose an object or pick up an item 
according to its popularity. On one hand, considering a 
new user involving in the system, it would be difficult for 
her to select a suitable object from numerous candidates. 
One reasonable action she would take is to choose a popu- 
lar item since other users also like it. On the other hand, 
old users who have devoted much time in playing the on- 
line platform, would know to find their own favorites and 
thus are likely to select personalized (hence might be less 
popular) items. That is to say, users are very proactive 
in performing online activities. In |51H53j . they reported 
such a hybrid behavior would result in a mixture between 
power-law and exponential distribution. By contrast, ob- 
jects in online systems are always in a passive pattern, 
hence do not have any choice but waiting to be selected 
to gain popularity. Therefore, we assume objects always 
grow based on preferential attachment in our model. 

We begin our study with some related definitions of bi- 
partite graph that we will analyze. The bipartite graph 
can be represented by G = (U, O, E), where U and O are 
two disjoint sets of nodes, respectively representing users 
and objects, and E C U x O is the set of edges. The 
difference with classical graph lies in the fact that edges 
exist only between user vertices and object vertices. The 
model starts from an initial bipartite network: there ex- 
ist Mo nodes in U, o$ nodes in O and eg edges in set E. 
Given a user i in U and an object j in O, denote ki as 
the degree of i and lj as the degree of j in the bipar- 
tite network. Then, eo = E &t = Y^^ji^i^j ^ 1>* = 

i j 

1, 2, uo, j = 1, 2, oo). There are totally N = u n + t 
users and M = oq + t objects in the model at time t. 
Consequently, the model can be described as following: 

• adding a new user: Connect the new user node to m 
different nodes already in O by preferential probabil- 
ity ^t^- 

E h 

• adding a new object: Link the new node to n dif- 
ferent nodes already in U by preferential probability 

kj 

E * 



1 http:/ /www. delicious. com/ 
2 http: / /www. citeulike.com 



p-2 



An Evolving model of online bipartite networks 



• edges evolving randomly: Two kinds of old nodes are 
connected by c edges, which are chosen as: users 



are selected randomly with probability 



while 



UQ+t ' 

objects in O are selected by preferential probability 

h 

oo+t-l ' 

E h 

3 = 1 



edges evolving by preferential attachment: Two kinds 
of old nodes are connected by b edges, which are cho- 
sen as: users are selected by preferential probability 
— -^ti — , and objects are also selected by preferential 
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Fig. 1: (Color online) Object degree distribution in a log- 
log scale of Delicious and Citeulike for (a) Real Data, (b) 
Theory, and (c) Simulation. 



Analytical Analysis. 

Object degree distribution. From the aforementioned 
model description, we can write the dynamics of degree 
for object Oj 



dt 



oo+t-l 

E h 



+ c 



u 



I, 



oo+t-l 

E h 



oo+t-l 

E h 



(1) 



oo+t-l 

where E h = ( l ) M > M = o +t, (I) 

3 = 1 

Then Eq. [JJ is approximated to 



(m+n+c+b)t+eo 
oo+t 



dt 



Wlj_ 

vt ' 



(2) 



where w~m + c + b, v=m + n + c + b, t3>m,n,c,b and 
i = 1,2,...,*. 

The initial degree of node j satisfies lj(tj)=n, where tj 
represents the time that node j is added into O. Therefore 
we obtain following equation by solving Eq. [5] 1 



U 



(3) 



Let lj(t) < Z,then ti > t(y)™- So the cumulative proba- 
bility P(lj(t) < I) can be denoted by P(U > i(y)™), such 
that 



P(l 3 (t)<l) = P(t l >t{- i )-) 



(4) 



In the model, all nodes are added into network with the 
same time interval, which means 



p{tj 



1 



(5) 



O + t 

Integrating Eq. [4]and Eq. [5j we can obtain the cumulative 
probability 



p(l j (t)<l)=p(t j >tj W ) = l 



oq + 1 n 



(6) 



Finally, with assuming as i > m, n, c, b, the object degree 
distribution can be written 



P (i) = dp ^ < 9 « i„*i- 

al w 



(7) 



From Eq. [JJ it is can be found that the object degree 
distribution accords with power-law distribution, with ex- 
ponent 7( = 1 + ^ . 

User degree distribution. Similar to the theoretical 
analysis of object degree distribution, the dynamics of user 
Ui can be written as 



dh 
dt 



u +t — l 

E h 

i=l 



1 , h 

C N uo+t-l ■ 

E h 

fc=l 



(8) 



where E N = u o + *> (*> = (m+ " +c+ ' )t+e ° 



!=1 



« +* 



(9) 



Then Eq. [8] is approximated to 

9fci uki c 
~dt ~ ~vt + N' 

where u=n + b, v=m + n + c + 6, t^>m,n,c,b and i = 
1,2,...,*. 

Since the initial degree of all users satisfies fcj(*i) = m, 
where *j represents the time user Ui is added into [/. Then 
we get following equation by solving Eq. [5] 



ki(t) 



(j-) " (cw + mil) — cv 



(10) 
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Substitute p(U) 
mulative probability 



u +t 



into Eq. (TUl we will get the cu- 



p(ki(t) <k) = l 



t 



ku 



uq + t cv + mu 



(11) 



So the user degree distribution function is finally achieved 
by assuming f>m,n,c,() 



p(k) 



dp(h(t) < k) 
dk 



[cv + mu) " + ku) 



(12) 



From Eq. [T^J we know that the user degree distribution 
is a mixture of exponential and power- low forms |51H53j . 
which is now familiar as stretched exponential distribution 

Results &: Analysis. — In this section, we use two 
data sets to evaluate the proposed model. The first one 
is Delicious, one of the most popular social bookmarking 
web sites, which allows users not only to store and or- 
ganize personal bookmarks, but also to look into users' 
collection and find what they might be interested in |55j . 
The other is from Citeulike, which also has similar char- 
acterizations with Delicious. Table. Q] shows the basic 
statistical properties of the two data sets. 

Degree distributions. Fig. [T] reports the object degree 
distribution result. It can be seen that both the simula- 
tion and analytical results fit will with the real data. In 
addition all the object-degree distributions are power-law, 
as p(l) = Z -7 , with 7 = 3.50 and 2.22 for Delicious and 
CiteU Like, respectively. 

For the user degree distribution, we focus on the cu- 
mulative degree distribution. Fig. [2] illustrates the cumu- 
lative degree distribution for users, Again, we find good 
agreements among the simulation, analytical and empir- 
ical results, in particular at the tail of the distribution. 
Therefore, the present model can qualitatively accurate 
for modeling the general real-world networks by assum- 
ing users' mixture behavior. The degree distributions for 
all users are similar to stretched exponential distribution 



p(k) 



, < c < 1. 



Table 1: Basic statistical properties of the Delicious and 
Citeulike. \U\, \0\ and \E\ denote the number of users, 
objects and edges, respectively, p = , r [f,Li denotes the 
sparsity of the data. 



\u\\o\ 



Data set 



\0\ 



E 



Delicious 
Citeulike 



9,998 
42,801 



232,657 
397,536 



123,995 
7,083,253 



5.305 xlO 
4.163 xlO" 



Understanding the effects of random and preferential at- 
tachment. From analysis in estimation of network, the 
user degree distribution is determined together by both 
preferential and random linking mechanisms. In order to 
further understand the effects of these two mechanisms, 
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Fig. 2: (Color online) The cumulative degree distribution 
of users in a log-log scale of Delicious and Citeulike for 
(a) Real Data, (b) Theory, and (c) Simulation. 



we introduce a structural parameter, p £ [0,1], to quan- 
tify different weights of them. Denote p as the weight of 
preferential mechanism, and 1—p refers to random choos- 
ing mechanism. According to the model description, we 
have p = „"^ e ■ Fig. [3] shows theoretically the user cu- 
mulative degree distribution for different p. 




? ig. 3: (Color online) The theoretical cumulative user de- 
cree distribution in log-log scale for different p, including 

(a) p=0.2 (black); (b) p=0.4 (red); (c) p=0.6 (blue); (d) 

p=0.8 (green). 

As shown in Fig. [31 an obvious correlation between p 
and the user cumulative degree distribution is observed. In 
addition, the scale- free region increases with the increment 
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of p, which indicates that p indeed can characterize the 
different structures driven by the two mechanisms. That 
is to say, for the extreme cases, p = 1 will produce a pure 
scale-free degree distribution, and p = will generate an 
exponential degree distribution. Otherwise, a stretched 
exponential decay will be observed for p G (0,1). The 
cumulative form of stretched exponential distribution is 

/ k \CQ 

P(k) ~ e k o ' } where ko is a constant and < cq < 1 is 
the characteristic exponent. The scale-free region of P(k) 
increases with the decrement of c G . Obviously, there may 
be a positive correlation between 1 — p and Co, such as 1 — 
p ~ c. The exponent c can be determined by considering 

/ k \ CQ 

the cumulative distribution P(k) ~ e k o' , which can 
be rewritten as log(—logP(k)) ~ cologk [15] . After if the 
corresponding curve can be well fitted by a straight line, 
then the slope is c — a(l — p), where a is a scale factor, 
Fig. [4] reports this result. 



mainly by preferential mechanism. Results of real data, 
theory and simulation are well fitted with each other. In 
addition, we also compare the weights of the two different 
mechanisms, and find out that a clear correlation between 
the structural parameter and the shape of user cumulative 
degree distribution. Our proposed model might shed some 
lights in understanding the underlying laws governing the 
structure of real online bipartite networks. 
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